Sified Stochastic Gradient Descent

نویسندگان

Maohua Zhu

Yuan Xie

Minsoo Rhu

Jason Clemons

Stephen W. Keckler

چکیده

Prior work has demonstrated that exploiting the sparsity can dramatically improve the energy efficiency and reduce the memory footprint of Convolutional Neural Networks (CNNs). However, these sparsity-centric optimization techniques might be less effective for Long Short-Term Memory (LSTM) based Recurrent Neural Networks (RNNs), especially for the training phase, because of the significant structural difference between the neurons. To investigate if there is possible sparsity-centric optimization for training LSTM-based RNNs, we studied several applications and observed that there is potential sparsity in the gradients generated in the backward propagation. In this paper, we investigate why the sparsity exists and propose a simple yet effective thresholding technique to induce further more sparsity during the LSTM-based RNN training. The experimental results show that the proposed technique can increase the sparsity of linear gate gradients to more than 80% without loss of performance, which makes more than 50% multiply-accumulate (MAC) operations redundant for the entire LSTM training process. These redundant MAC operations can be eliminated by hardware techniques to improve the energy efficiency and the training speed of LSTM-based RNNs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...

متن کامل

Conjugate gradient neural network in prediction of clay behavior and parameters sensitivities

The use of artificial neural networks has increased in many areas of engineering. In particular, this method has been applied to many geotechnical engineering problems and demonstrated some degree of success. A review of the literature reveals that it has been used successfully in modeling soil behavior, site characterization, earth retaining structures, settlement of structures, slope stabilit...

متن کامل

Comparison of Modern Stochastic Optimization Algorithms

Gradient-based optimization methods are popular in machine learning applications. In large-scale problems, stochastic methods are preferred due to their good scaling properties. In this project, we compare the performance of four gradient-based methods; gradient descent, stochastic gradient descent, semi-stochastic gradient descent and stochastic average gradient. We consider logistic regressio...

متن کامل

Conditional Accelerated Lazy Stochastic Gradient Descent

In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate O( 1 ε2 ) improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate O( 1 ε4 ).

متن کامل

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we call stochastic variance reduced gradient (SVRG). For smooth and strongly convex functions, we prove that this method enjoys the same fast conv...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Sified Stochastic Gradient Descent

نویسندگان

چکیده

منابع مشابه

Identification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network

Conjugate gradient neural network in prediction of clay behavior and parameters sensitivities

Comparison of Modern Stochastic Optimization Algorithms

Conditional Accelerated Lazy Stochastic Gradient Descent

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

عنوان ژورنال:

اشتراک گذاری